Digital Libraries and Document Image Analysis
نویسنده
چکیده
The rapid growth of digital libraries (DLs) worldwide poses many new challenges for document image analysis (DIA) research and development. DLs promise to offer more people access to larger document collections, and at far greater speed, than physical libraries can. But DLs also tend, for many reasons, to serve poorly, or even to omit entirely, many types of non-digital human–legible media, such as originally printed and handwritten documents. These media, in their original physical (undigitized) form, are readily — if not always quickly — legible, searchable, and browseable, whereas in the form of document images accessed through DLs they often lose many of their original advantages while of course lacking many advantages of symbolically encoded information. The author explores these issues and illustrates them with brief case studies arising from his experience as a DIA researcher in collaboration with several DL projects in the US. Difficult open DIA technical problems in DL applications are identified in the contrasting advantages of paper and digital displays, at every stage of capture, early processing, recognition, analysis, presentation, & retrieval, and in personal and interactive applications. These support the conclusion that the international DIA R&D community is urgently needed (because uniquely qualified) to provide new technology to help rescue from neglect — even, in many cases, eventual oblivion — the world’s vast culturally irreplaceable legacy paper document collections.
منابع مشابه
Digital Libraries and Document Image Analysis Techniques: a Survey
Nowadays, Digital Libraries have become a widely used service to store and share both digital born documents and digital versions of works stored by traditional libraries. Document images are intrinsically non-structured and the structure and semantic of the digitized documents is in most part lost during the conversion. Several techniques related to the Document Image Analysis research area ha...
متن کاملDigital Libraries and Document Image Retrieval Techniques: A Survey
Nowadays, Digital Libraries have become a widely used service to store and share both digital born documents and digital versions of works stored by traditional libraries. Document images are intrinsically non-structured and the structure and semantic of the digitized documents is in most part lost during the conversion. Several techniques related to the Document Image Analysis research area ha...
متن کاملDigitizing a Million Books: Challenges for Document Analysis
This paper describes the challenges for document image analysis community for building large digital libraries with diverse document categories.The challengesare identified fromthe experienceof theon-going activities toward digitizing and archiving onemillion books. Smooth workflow has been established for archiving large quantity of books, with the help of efficient imageprocessing algorithms....
متن کاملTools for Document Image Retrieval in Digital Libraries: the AIDI System
In the last few years, Digital Libraries became one important application area for Document Image Analysis and Recognition research [1]. In this field, a relevant line of research is Document Image Retrieval (DIR) that aims at finding relevant documents relying on image features only. DIR techniques are used to index not only the textual content of a document, but also its layout, graphical obj...
متن کاملDocument Image Dewarping Based on Text Line Detection and Surface Modeling (RESEARCH NOTE)
Document images produced by scanner or digital camera, usually suffer from geometric and photometric distortions. Both of them deteriorate the performance of OCR systems. In this paper, we present a novel method to compensate for undesirable geometric distortions aiming to improve OCR results. Our methodology is based on finding text lines by dynamic local connectivity map and then applying a l...
متن کاملThe Role of Digital Reality Technologies in Libraries: A Systemic Review
Introduction: Fourth-generation libraries can no longer be satisfied with web software facilities and must use technologies, including digital realities to increase the level of service and attract clients. Therefore, this review aimed to identify the roles and effects of these technologies in libraries. Methods: In this systematic review, PubMed, Web of Science, Scopus, and Google Scholar adv...
متن کامل